>>>>>>>>>>> # The Perfect Ratio ##### A SP2273 project by Prathy, Jin Chi, Jun Yuan and Hai ***
1. Introduction
2. Context - Mendelian Genetics
a. Mendel's Law of Segregation
b. Mendel's Law of Independent Assortment
3. Different Models of Inheritance
a. Monohybrid
b. Dihybrid
c. Trihybrid
d. Our Model
4. Epistasis
a. Epistasis
c. Epistasis Between More Than 2 Genes
a. Strengths
b. Limitations
c. Extensions
6. Conclusion
7. References
Genetics is the scientific study of genes and heredity. It covers a wide range of knowledge spanning from observable inheritance patterns to the biomolecular basis of the variation of traits. While certain traits (also known as phenotypes) in living organisms can be directly understood by elucidating the biomolecular basis through analysing related DNA sequences (genotype) of the trait, there remain traits that are harder to comprehend.
There are many instances where the genotype does not directly provide information about the phenotype, one of which is the phenomenon of epistasis. Epistasis refers to the process where two or more genes interact with one another, thus influencing the resulting phenotype.1
Due to the unknown interactions between genes in epistasis, it is often difficult to determine the pattern of inheritance when the phenotypic ratios deviates from the Classic Mendelian Ratios (see Context and Assumptions). Therefore, we are determined to construct a model that takes into consideration the effects of epistasis when predicting the phenotypic ratio in the offspring according to the genotypes of the parents.
The foundations governing modern genetics are based upon three fundamental laws postulated by Gregor Mendel:
a) Mendel's Law of Segregation
b) Mendel's Law of Independent Assortment
c) Mendel's Law of Dominance
Mendel's Law of Segregation states that a single characteristic of an organism is controlled by two copies (or alleles*) of a gene. These copies will separate during the formation of gametes(or sex cells) (e.g. sperm or egg cells), and only one of them will be received by the offspring. (Figure 1)
*Definition: Alleles refer to the variants of a particular gene. They are different forms of a gene and contribute to different characteristics. A convention in the field of genetics is to use a single alphabet to denote both alleles that are on a particular gene. For example in Figure 1, T and t implies that both of them are different alleles on the same gene.
Images source: https://teaching.ncl.ac.uk/bms/wiki/index.php/File:The_Principle_of_Segregation_(diagram).png
Figure 1 demonstrates the process of gamete formation called Meiosis. Meiosis is the process that explains the molecular basis of Mendel's Law of segregation, whereby the two alleles which are denoted as T and t in Figure 1 are separated, and each gamete receives either of the alleles (As can be seen in the figure, each gamete receives either T or t allele for gene T).
Mendel's Law of Independent Assortment states that the allele of a gene will be randomly combined with the allele of another gene in gamete formation (Figure 2).
Image sources: https://www.shaalaa.com/images/_4:9d02372f443c4bdea9462e88494383cd.png
With reference to Figure 2, the possible alleles Y and y of gene Y are independently combined with the alleles R and r of gene R (assuming they are on different chromosomes) during gamete production. The parents' phenotypes are YYRR and yyrr, producing the (F$_1$ generation) (first generation) with genotype being all YyRr. The F1 generation then self-fertilizes, creating gametes with all possible combinations of Y and R alleles (Y, y, R and r). Thus, these alleles can randomly combine with one another to form gametes consisting of 1 allele from each gene, giving rise to the four types of gametes as seen in Figure 2. This process is called independent assortment.
We have seen so far that alleles of a gene can be either CAPITAL LETTERS or small letters. This is explained by Mendel's Law of Dominance. The law states that in an event that the genotype of an individual is heterozygous (having different alleles on the same gene), one allele will conceal the expression of another (Figure 3). A convention in genetics is that the DOMINANT (the concealer) allele will be denoted with CAPITAL LETTERS while the RECESSIVE (the concealed) allele will be denoted with small letters ("T" and "t" as seen from the above example)
Image sources:https://d20khd7ddkh5ls.cloudfront.net/genetics_inheritance_dominant_cherry.jpg
Figure 3 shows the relationship between genotypes and phenotypes according to Mendel's Law of Dominance, where the homozygous (having the same alleles) genotype AA and aa result in red and yellow cherries respectively. However, for the genotype Aa, the cherry is red in color. This means that the expression of the a allele is being concealed. Therefore, the a allele is called a recessive allele whilst the A allele is called a dominant allele because it is expressed in the heterozygous genotype.
This section focuses on the presentation of the Classical Mendelian Phenotypic Ratio. The Punnett Square method is applied to illustrate how phenotypic ratios are obtained as a result of different types of crosses. These results will then be used to examine the accuracy of our model.
In the following examples, we examine P and p as the first set of dominant and recessive alleles, Q and q as the second set of alleles, R and r as the third set of alleles.
A monohybrid cross models the inheritance of a single gene. The gene consists of two alleles, P and p. When gametes are formed, they carry one allele each. A zygote is formed from the fusion of two parental gametes, which means that the offspring carries one allele (P or p) from each parent. Thus, the Punnett square below shows all possible combinations between the heterozygous parental gametes to form offspring genotypes. Since the P allele is assumed to be dominant over the p allele, the PP and Pp genotype will produce the same phenotype whereas the pp genotype will produce a different phenotype as indicated in the Punnett Square below. Hence, the final phenotypic ratio obtained is 3:1.
Punnett Square:
A dihybrid cross models the same inheritance as the monohybrid cross, but instead of investigating a single gene, two genes are modelled. The offspring phenotypic ratio is 9:3:3:1.
Punnett Square:
Similarly, when looking at three genes (P, Q and R), we can use a trihybrid cross to model their patterns of inheritance. The phenotypic ratio in the offspring will be 27:9:9:9:3:3:3:1.
Punnett Square:
We first look at how to calculate the **genotypic ratio** when crossing any two given parents, according to Mendel's classical models of inheritance.
This is an important step before calculating the desired offspring phenotypic ratio. It is because, after we have obtained the genotypic ratio, we only need to combine the genotypes that give rise to the same phenotype and, thereforth, compare the phenotypes to obtain the phenotypic ratio.
To begin with, we recognise the need for a more convenient way of storing the alleles in a genotype (for instance, "A" and "a", "B" and "b") in order to not excessively use the .isupper() and .islower() methods. Thus, we realise that it is better to convert genotypes (e.g., "AaBBCc") into a tuple containing tuples of 0s and 1s.
## Some important modules imported for future use ##
import numpy as np
import itertools
import pandas as pd
import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
def convert_letter_to_binary(letter):
'''
Auxiliary function, converting letters (e.g., "A") into binaries, with dominant gene being 1 and recessive gene
being 0. For instance, "A" will translate to 1 while "a" will translate to 0.
'''
if letter.islower():
return 0
else:
return 1
def convert_geno_to_tup(geno):
'''
Auxiliary function, converting a genotype to a tuple (list) of binaries. For example, "AaBb" will translate
to ((1, 0), (1, 0))
'''
tup = ()
final_tup = ()
for i in range(0, len(geno), 2):
tup += ((geno[i], geno[i+1]),)
for i in tup:
final_tup += (tuple(map(convert_letter_to_binary, i)),)
return final_tup
## Sample implementation ##
###########################
letter = "a"
geno = "AabbCCddEe"
print(convert_letter_to_binary(letter))
print(convert_geno_to_tup(geno))
0 ((1, 0), (0, 0), (1, 1), (0, 0), (1, 0))
Next, to calculate the genotype ratio for multiple genes, we start with the calculation of genotypic ratio of F1 generation for a single gene with two alleles (for instance, gene A gives rise to "AA", "Aa" and "aa" genotypes.
def gen_ratio_single_gene(geno1, geno2):
'''
Returns the genotypic ratio of a single gene of two alleles ("Aa"). geno1 is the genotype of parent 1,
geno2 is the genotype of parent 2. Ratio is represented as a tuple of 3 elements
(x, y, z). x represents the *odds* of HOMOZYGOUS DOMINANT, y represents HETEROZYGOUS, and z represents
HOMOZYGOUS RECESSIVE. To understand how the *odds* are arrived at, see *blue box*
'''
if geno1 == geno2: ## if the genotypes are the same between 2 parents
if geno1 == (0, 0): ## and if the genotypes are homozygous recessive
ratio = (0, 0, 1) ## then ratio is 1 (or 100%) homozygous recessive in F1
elif geno1 == (1, 1): ## else if the genotypes are homozygous dominant
ratio = (1, 0 , 0) ## then ratio is 1 (or 100%) homozygous dominant in F1
else: ## if the genotypes are both heterozygous
ratio = (0.25, 0.5, 0.25) ## then ratio is 25-50-25 in F1
else: ## if the genotypes are different
if (geno1 == (0, 0) and geno2 == (1, 1)) or (geno1 == (1, 1) and geno2 == (0, 0)):
ratio = (0, 1, 0) #if one parent is homozygous dominant and the other recessive, ratio is 100% heterozygous
elif (geno1 == (1, 0) and geno2 == (1, 1)) or (geno1 == (1, 1) and geno2 == (1, 0)):
ratio = (0.5, 0.5 , 0) #if one parent is homozygous dominant and the other heterozygous
else: # if one parent is homozygous recessive and the other heterozygous
ratio = (0, 0.5, 0.5)
return ratio
## Sample implementation ##
###########################
geno1 = "aa"
geno2 = "Aa"
print(gen_ratio_single_gene(geno1, geno2))
(0, 0.5, 0.5)
Having laid the foundations for the calculation of genotypic ratio for a single gene, we attempt to implement a function that can calculate the ratio for **any number of genes**, using the defined functions.
def multiply_ratios(ratio1, ratio2): #to be used in gen_ratio_multiple_gene
'''
Auxiliary function, returns a new tuple that contains the products achieved when multiplying each element of ratio1
by each element of ratio2. For example, multiply_ratios((1, 2, 3), (0, 0, 1)) returns (0, 0, 1, 0, 0, 2, 0, 0, 3)
'''
new_tup = ()
for i in ratio1:
for j in ratio2:
new_tup += (i*j,)
return new_tup
def get_all_possible_combinations_of_letters(parent_genotype): #to be used in gen_ratio_multiple_gene
'''
Returns a list of all possible combinations of lowercase and uppercase letters in parent_genotype to form genotypes.
For instance, if parent_genotype = "AaBbCc", meaning it contains letters a, b and c, the function will return:
[('AA', 'BB', 'CC'), ('AA', 'BB', 'Cc'), ('AA', 'BB', 'cc'), ('AA', 'Bb', 'CC'), ('AA', 'Bb', 'Cc'), ('AA', 'Bb', 'cc'),
('AA', 'bb', 'CC'), ('AA', 'bb', 'Cc'), ('AA', 'bb', 'cc'), ('Aa', 'BB', 'CC'), ('Aa', 'BB', 'Cc'), ('Aa', 'BB', 'cc'),
('Aa', 'Bb', 'CC'), ('Aa', 'Bb', 'Cc'), ('Aa', 'Bb', 'cc'), ('Aa', 'bb', 'CC'), ('Aa', 'bb', 'Cc'), ('Aa', 'bb', 'cc'),
('aa', 'BB', 'CC'), ('aa', 'BB', 'Cc'), ('aa', 'BB', 'cc'), ('aa', 'Bb', 'CC'), ('aa', 'Bb', 'Cc'), ('aa', 'Bb', 'cc'),
('aa', 'bb', 'CC'), ('aa', 'bb', 'Cc'), ('aa', 'bb', 'cc')]
'''
n = int(len(parent_genotype)/2) #n is the number of genes we are dealing with
letters = [] #generate all possible single gene genotypes ("AA", "Aa" and "aa") from the given genes in parent
letters_upper = [(parent_genotype[i:i+2]).upper() for i in range(0, len(parent_genotype), 2)] #get all the big letters ("AA", "BB", "CC")
letters_lower = [(parent_genotype[i:i+2]).lower() for i in range(0, len(parent_genotype), 2)] #get all the small letters ("aa", "bb", "cc")
letters_mixed = [parent_genotype[i].upper() + parent_genotype[i + 1].lower() for i in range(0, len(parent_genotype), 2)] #get all the mixed letters ("Aa", "Bb", "Cc")
for i in range(n):
letters.extend([[letters_upper[i], letters_mixed[i], letters_lower[i]]]) #combine them all together in letters list
all_combinations = list(itertools.product(*letters)) #generate all possible combinations of the letters to form genotypes
return all_combinations
def gen_ratio_multiple_gene(parent1, parent2):
'''
Returns the phenotypic ratio of multiple genes of two alleles ("AaBbCc"). Ratio is represented in a dictionary whereby
the keys are the genotypes while the values are the probability of the genotypes.
'''
n = int(len(parent1)/2) #n is the number of genes we are dealing with, assuming parent1 and parent2 have same genes
all_combinations = get_all_possible_combinations_of_letters(parent1)
genotypes = []
for combination in all_combinations:
combination = "".join(combination) #join the letters together to form a proper phenotype
genotypes.append(combination) #put the combination into possible list of genotypes
parent1 = convert_geno_to_tup(parent1) #convert genotype of parent1 to tuple
parent2 = convert_geno_to_tup(parent2) #convert genotype of parent2 to tuple
ratios = []
for i in range(len(parent1)):
ratios += (gen_ratio_single_gene(parent1[i], parent2[i]),) #get ratios from combining genes from parent1 with genes from parent2 one by one
'''
Lines 47 to 56 is an important step. Here, we will be combining the individual ratios for each gene all together,
to form the final ratios of genotype. This part will be elaborated on further (see blue box below)
'''
for i in range(len(ratios) - 1):
ratios[0] = multiply_ratios(ratios[0], ratios[1])
del ratios[1]
gen_ratio = ratios[0]
pairs = zip(genotypes, gen_ratio) #pairing up all the possible genotypes with their respective ratios
gen_probs = dict(pairs) #make it into a dictionary
copy = gen_probs.copy()
for key in copy.keys():
if gen_probs.get(key) == 0: #if the probability of a genotype is 0, remove it from the dictionary
gen_probs.pop(key)
return gen_probs
def display_gen(gen_ratio):
'''
Displays genotypic ratios in a table form
'''
print("{:<30} {:<20}".format('F1 Genotype', 'Probability'))
for k, v in gen_ratio.items():
print("{:<30} {:<20}".format(k, v))
## Sample implementation ##
###########################
parent1 = input("Input genotype of parent 1: ")
parent2 = input("Input genotype of parent 2: ")
gen_ratio = gen_ratio_multiple_gene(parent1, parent2)
display_gen(gen_ratio)
Input genotype of parent 1: AaBBCcDd Input genotype of parent 2: AAbbCcDD F1 Genotype Probability AABbCCDD 0.0625 AABbCCDd 0.0625 AABbCcDD 0.125 AABbCcDd 0.125 AABbccDD 0.0625 AABbccDd 0.0625 AaBbCCDD 0.0625 AaBbCCDd 0.0625 AaBbCcDD 0.125 AaBbCcDd 0.125 AaBbccDD 0.0625 AaBbccDd 0.0625
We can now move on to another important part of our project - **how to calculate the phenotypic ratio** from the genotypic ratio attained Since we have only considered crosses that follow the Classical Mendelian ratios without variations (no epistasis), we will go ahead and design two functions: a function that converts any genotype into its corresponding phenotype (according to Mendel's ratios, without any deviation) and a function that can calculate the phenotypic ratio given two parents' genotype**.
For simplicity, we will refer to phenotypes for homozygous dominant and heterozygous as "GENE XXX IS EXPRESSED" and phenotypes for homozygous recessive as "GENE XXX NOT EXPRESSED".
Firstly, we define the function that can convert any genotype into its phenotype.
def convert_gen_to_phen(genotype):
phen = ""
for i in range(0, len(genotype), 2):
if genotype[i].isupper(): #as long as there is a dominant allele
phen += f"{genotype[i]} expressed - " #gene is expressed
else:
phen += f"{genotype[i].upper()} not expressed - " #if there is no dominant allele, gene is not expressed
phen = phen[:-3] #remove the last " - "
return phen
## Sample implementation ##
###########################
genotype = input("Input genotype of any size: ")
print(convert_gen_to_phen(genotype))
Input genotype of any size: AaBBccDdEEFfGghh A expressed - B expressed - C not expressed - D expressed - E expressed - F expressed - G expressed - H not expressed
Now that we are able to convert any genotype to its corresponding phenotype, we can define a function to calculate the phenotypic ratio for **any** genotypic combination between parents.
def phen_ratio_multiple_genes(parent1, parent2):
'''
Returns a dictionary representing the phenotypic ratio of the cross. The keys of the dictionary are the possible
phenotypes and the values are the ratios.
'''
n = int(len(parent1)/2) #n represents the number of genes we are handling
gen_ratio = gen_ratio_multiple_gene(parent1, parent2) #retrieve the genotypic ratio dictionary formed by these two parents' genotypes
genotypes = gen_ratio.keys() #extract only a list of possible genotypes
phenotypes = {}
for geno in genotypes:
pheno = convert_gen_to_phen(geno) #convert the genotype to phenotype
'''
(Line 17) Since gen_ratio[geno] only returns the probability of the genotype, we want to turn it into odds
by applying a simple formula: probability*4**n.
(4**n is the total number of non-distinct genotypic combinations (there is some overlap))
This will ensure that the odds are whole integers and not fractions.
'''
phenotypes[pheno] = phenotypes.get(pheno,0) + int(gen_ratio[geno]*4**n) #Attach integer values to each phenotypes, the get values allowed no found phenotypes to have a default value of 0
sorted_phen = dict(sorted(phenotypes.items(), key = lambda x: x[1], reverse = True)) #sort by reversed order of ratio
return sorted_phen
def display_phen(phen_ratio):
'''
Displays genotypic ratios in a table form
'''
print("{:<100} {:<20}".format('F1 Phenotype', 'Ratio'))
for k, v in phen_ratio.items():
print("{:<100} {:<20}".format(k, v))
## Sample implementation ##
###########################
parent1 = input("Input parent 1's genotype: ")
parent2 = input("Input parent 2's genotype: ")
phen_ratio = phen_ratio_multiple_genes(parent1, parent2)
display_phen(phen_ratio)
Input parent 1's genotype: AaBbCcDd Input parent 2's genotype: AaBbCcDd F1 Phenotype Ratio A expressed - B expressed - C expressed - D expressed 81 A expressed - B expressed - C expressed - D not expressed 27 A expressed - B expressed - C not expressed - D expressed 27 A expressed - B not expressed - C expressed - D expressed 27 A not expressed - B expressed - C expressed - D expressed 27 A expressed - B expressed - C not expressed - D not expressed 9 A expressed - B not expressed - C expressed - D not expressed 9 A expressed - B not expressed - C not expressed - D expressed 9 A not expressed - B expressed - C expressed - D not expressed 9 A not expressed - B expressed - C not expressed - D expressed 9 A not expressed - B not expressed - C expressed - D expressed 9 A expressed - B not expressed - C not expressed - D not expressed 3 A not expressed - B expressed - C not expressed - D not expressed 3 A not expressed - B not expressed - C expressed - D not expressed 3 A not expressed - B not expressed - C not expressed - D expressed 3 A not expressed - B not expressed - C not expressed - D not expressed 1
Numbers are sometimes not the most effective way of visualizing the ratios of different phenotypes. Thus, we decide to include a method for visualization via the construction of labeled and coloured Punnett Squares.
## Initialize a colour pallete to be used ##
ALL_COLOURS = [mcolors.XKCD_COLORS[x] for x in mcolors.XKCD_COLORS.keys() if "black" not in x]
def delete_same_letters(lst): #to be used in cross()
'''
Auxiliary function that deletes elements of the list if the elements contain repeating letters
'''
copy = lst.copy()
for elem in copy:
for i, j in enumerate(elem):
if j.upper() in elem[i + 1:] or j.lower() in elem[i + 1:]:
lst.remove(elem)
break
return lst
def cross_table(parent1, parent2):
'''
Returns a Punnett Square representing the odds of each phenotype and legends that include name of phenotype, colour
and ratio.
'''
parent1_gametes = list(itertools.combinations(parent1, int(len(parent1) / 2))) #creates all possible combinations of n letters in the genotype of parent1, where n is the number of genes in genotype
parent1_gametes = delete_same_letters(parent1_gametes) #delete those combinations with repeated letters, only those with distinct letters remain -> these are the possible combinations to form gametes
parent2_gametes = list(itertools.combinations(parent2, int(len(parent2) / 2))) #same as above
parent2_gametes = delete_same_letters(parent2_gametes) #same as above
for i in range(len(parent1_gametes)):
parent1_gametes[i] = ''.join(parent1_gametes[i]) + f' ({i + 1})' #join the letters in each combination together to form gametes, with the index appended at the back
for i in range(len(parent2_gametes)):
parent2_gametes[i] = ''.join(parent2_gametes[i]) + f' ({i + 1})' #same as above
square = pd.DataFrame(columns=pd.MultiIndex.from_product([['Parent 2 Gametes'], #create a dataframe that represents the Punnett Square
parent2_gametes]),
index=pd.MultiIndex.from_product([['Parent 1 Gametes'],
parent1_gametes]))
legends = {"phenotypes" : [], "colours" : [], "ratios": [] } #create a dictionary of legends
phenotypes = list(phen_ratio_multiple_genes(parent1, parent2).keys())
phenotype_with_ratios = phen_ratio_multiple_genes(parent1, parent2)
genotypes = [] #see yellow box below for reasons why we need to create a new list of genotypes from gametes
for row in range(len(parent1_gametes)):
for column in range(len(parent2_gametes)):
'''
This part attempts to join the gametes from each parent together to form unordered genotypes.
The reformed genotypes order and arrange the letters in the genotypes so that the capital letters at the front
'''
genotype= ''.join([''.join(i) for i in zip(parent1_gametes[row][:len(parent1_gametes[0]) - 4],parent2_gametes[column][:len(parent2_gametes[0]) - 4])])
reformed_genotype=''
for i in range(0,len(genotype),2):
reformed_genotype+=genotype[i:i+2].replace(genotype[i:i+2],''.join(sorted(genotype[i:i+2])))
square.iloc[row, column] = reformed_genotype #add the reformed genotypes into the dataframe
genotypes.append(reformed_genotype)
for geno in genotypes: #for each genotype in genotypes
phenotype = convert_gen_to_phen(geno) #convert to phenotype
for i, phen in enumerate(phenotypes):
if phen == phenotype: #go through the whole list of possibe phenotypes. When an element matches the phenotype we are looking more, assign colour index for phenotype to the index of element
colour_index = i
colour = ALL_COLOURS[colour_index]
if phenotype not in legends["phenotypes"]: #add to legends of phenotype and its colour is not already inside
legends["phenotypes"].append(phenotype)
legends["colours"].append(colour)
for pheno in legends["phenotypes"]:
legends["ratios"].append(phenotype_with_ratios[pheno]) #add to legends of each phenotype's ratio
legends_table = pd.DataFrame.from_dict(legends).set_index("phenotypes").sort_values(by = "ratios", ascending = False)
def cell_colour(genotype): #this function returns the background colour corresponding to each genotype
phenotype = convert_gen_to_phen(genotype)
index = legends["phenotypes"].index(phenotype) #using the index of "phenotypes" in legends dict
colour = ALL_COLOURS[index] #to ge color from the color palette
return f"background-color: {colour}"
def legend_colour(colour): #this function returns the background colour corresponding to each genotype
return f"background-color: {colour}"
square = square.style.applymap(cell_colour) #apply background colours to each genotype in square
legends_table = legends_table.style.applymap(legend_colour, subset = "colours") #apply background colours to legends
return square, legends_table
def display_cross(cross_table):
'''
Display the cross table
'''
return cross_table[0]
def display_legends(cross_table):
'''
Display the legends
'''
return cross_table[1]
## Sample implementation ##
###########################
parent1 = input("Input parent 1's genotype: ")
parent2 = input("Input parent 2's genotype: ")
cross = cross_table(parent1, parent2)
display_cross(cross)
Input parent 1's genotype: AaBbCcDd Input parent 2's genotype: AaBbCcDd
| Parent 2 Gametes | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ABCD (1) | ABCd (2) | ABcD (3) | ABcd (4) | AbCD (5) | AbCd (6) | AbcD (7) | Abcd (8) | aBCD (9) | aBCd (10) | aBcD (11) | aBcd (12) | abCD (13) | abCd (14) | abcD (15) | abcd (16) | ||
| Parent 1 Gametes | ABCD (1) | AABBCCDD | AABBCCDd | AABBCcDD | AABBCcDd | AABbCCDD | AABbCCDd | AABbCcDD | AABbCcDd | AaBBCCDD | AaBBCCDd | AaBBCcDD | AaBBCcDd | AaBbCCDD | AaBbCCDd | AaBbCcDD | AaBbCcDd |
| ABCd (2) | AABBCCDd | AABBCCdd | AABBCcDd | AABBCcdd | AABbCCDd | AABbCCdd | AABbCcDd | AABbCcdd | AaBBCCDd | AaBBCCdd | AaBBCcDd | AaBBCcdd | AaBbCCDd | AaBbCCdd | AaBbCcDd | AaBbCcdd | |
| ABcD (3) | AABBCcDD | AABBCcDd | AABBccDD | AABBccDd | AABbCcDD | AABbCcDd | AABbccDD | AABbccDd | AaBBCcDD | AaBBCcDd | AaBBccDD | AaBBccDd | AaBbCcDD | AaBbCcDd | AaBbccDD | AaBbccDd | |
| ABcd (4) | AABBCcDd | AABBCcdd | AABBccDd | AABBccdd | AABbCcDd | AABbCcdd | AABbccDd | AABbccdd | AaBBCcDd | AaBBCcdd | AaBBccDd | AaBBccdd | AaBbCcDd | AaBbCcdd | AaBbccDd | AaBbccdd | |
| AbCD (5) | AABbCCDD | AABbCCDd | AABbCcDD | AABbCcDd | AAbbCCDD | AAbbCCDd | AAbbCcDD | AAbbCcDd | AaBbCCDD | AaBbCCDd | AaBbCcDD | AaBbCcDd | AabbCCDD | AabbCCDd | AabbCcDD | AabbCcDd | |
| AbCd (6) | AABbCCDd | AABbCCdd | AABbCcDd | AABbCcdd | AAbbCCDd | AAbbCCdd | AAbbCcDd | AAbbCcdd | AaBbCCDd | AaBbCCdd | AaBbCcDd | AaBbCcdd | AabbCCDd | AabbCCdd | AabbCcDd | AabbCcdd | |
| AbcD (7) | AABbCcDD | AABbCcDd | AABbccDD | AABbccDd | AAbbCcDD | AAbbCcDd | AAbbccDD | AAbbccDd | AaBbCcDD | AaBbCcDd | AaBbccDD | AaBbccDd | AabbCcDD | AabbCcDd | AabbccDD | AabbccDd | |
| Abcd (8) | AABbCcDd | AABbCcdd | AABbccDd | AABbccdd | AAbbCcDd | AAbbCcdd | AAbbccDd | AAbbccdd | AaBbCcDd | AaBbCcdd | AaBbccDd | AaBbccdd | AabbCcDd | AabbCcdd | AabbccDd | Aabbccdd | |
| aBCD (9) | AaBBCCDD | AaBBCCDd | AaBBCcDD | AaBBCcDd | AaBbCCDD | AaBbCCDd | AaBbCcDD | AaBbCcDd | aaBBCCDD | aaBBCCDd | aaBBCcDD | aaBBCcDd | aaBbCCDD | aaBbCCDd | aaBbCcDD | aaBbCcDd | |
| aBCd (10) | AaBBCCDd | AaBBCCdd | AaBBCcDd | AaBBCcdd | AaBbCCDd | AaBbCCdd | AaBbCcDd | AaBbCcdd | aaBBCCDd | aaBBCCdd | aaBBCcDd | aaBBCcdd | aaBbCCDd | aaBbCCdd | aaBbCcDd | aaBbCcdd | |
| aBcD (11) | AaBBCcDD | AaBBCcDd | AaBBccDD | AaBBccDd | AaBbCcDD | AaBbCcDd | AaBbccDD | AaBbccDd | aaBBCcDD | aaBBCcDd | aaBBccDD | aaBBccDd | aaBbCcDD | aaBbCcDd | aaBbccDD | aaBbccDd | |
| aBcd (12) | AaBBCcDd | AaBBCcdd | AaBBccDd | AaBBccdd | AaBbCcDd | AaBbCcdd | AaBbccDd | AaBbccdd | aaBBCcDd | aaBBCcdd | aaBBccDd | aaBBccdd | aaBbCcDd | aaBbCcdd | aaBbccDd | aaBbccdd | |
| abCD (13) | AaBbCCDD | AaBbCCDd | AaBbCcDD | AaBbCcDd | AabbCCDD | AabbCCDd | AabbCcDD | AabbCcDd | aaBbCCDD | aaBbCCDd | aaBbCcDD | aaBbCcDd | aabbCCDD | aabbCCDd | aabbCcDD | aabbCcDd | |
| abCd (14) | AaBbCCDd | AaBbCCdd | AaBbCcDd | AaBbCcdd | AabbCCDd | AabbCCdd | AabbCcDd | AabbCcdd | aaBbCCDd | aaBbCCdd | aaBbCcDd | aaBbCcdd | aabbCCDd | aabbCCdd | aabbCcDd | aabbCcdd | |
| abcD (15) | AaBbCcDD | AaBbCcDd | AaBbccDD | AaBbccDd | AabbCcDD | AabbCcDd | AabbccDD | AabbccDd | aaBbCcDD | aaBbCcDd | aaBbccDD | aaBbccDd | aabbCcDD | aabbCcDd | aabbccDD | aabbccDd | |
| abcd (16) | AaBbCcDd | AaBbCcdd | AaBbccDd | AaBbccdd | AabbCcDd | AabbCcdd | AabbccDd | Aabbccdd | aaBbCcDd | aaBbCcdd | aaBbccDd | aaBbccdd | aabbCcDd | aabbCcdd | aabbccDd | aabbccdd | |
## Sample implementation ##
###########################
display_legends(cross)
| colours | ratios | |
|---|---|---|
| phenotypes | ||
| A expressed - B expressed - C expressed - D expressed | #acc2d9 | 81 |
| A expressed - B expressed - C expressed - D not expressed | #56ae57 | 27 |
| A expressed - B expressed - C not expressed - D expressed | #b2996e | 27 |
| A expressed - B not expressed - C expressed - D expressed | #a8ff04 | 27 |
| A not expressed - B expressed - C expressed - D expressed | #69d84f | 27 |
| A expressed - B expressed - C not expressed - D not expressed | #894585 | 9 |
| A expressed - B not expressed - C expressed - D not expressed | #70b23f | 9 |
| A expressed - B not expressed - C not expressed - D expressed | #d4ffff | 9 |
| A not expressed - B expressed - C expressed - D not expressed | #65ab7c | 9 |
| A not expressed - B expressed - C not expressed - D expressed | #952e8f | 9 |
| A not expressed - B not expressed - C expressed - D expressed | #fcfc81 | 9 |
| A expressed - B not expressed - C not expressed - D not expressed | #a5a391 | 3 |
| A not expressed - B expressed - C not expressed - D not expressed | #388004 | 3 |
| A not expressed - B not expressed - C expressed - D not expressed | #4c9085 | 3 |
| A not expressed - B not expressed - C not expressed - D expressed | #5e9b8a | 3 |
| A not expressed - B not expressed - C not expressed - D not expressed | #efb435 | 1 |
While Mendel's ratios laid a good foundation for the study of inheritance, many geneticists found inconsistencies in these ratios. There are several reasons for the deviation in ratios and one of them is epistasis. As mentioned in Introduction, epistasis is a genetic phenomenon that refers to the interactions between genes that govern a phenotype2. These interactions can involve 2 or more genes. We will explore different types of epistasis that often come up in genetic studies in the next sections.
As observed from the dihybrid cross in Section 3, the expected offspring phenotypic ratio when two heterozygous parents are crossed is 9:3:3:1. When epistasis is present, this ratio will be modified.
Epistasis between 2 genes encompasses a variety of interactions and ratios. One of the most common type of epistasis is recessive epistasis. When a pair of recessive alleles at one gene position (locus) masks the expression of both dominant and recessive alleles at another locus, the phenomenon is considered recessive epistasis and results in a 9:3:4 (instead of 9:3:3:1) ratio. One possible interaction of genes is shown below. Recessive epistasis is often seen in the coat colour of mice. With a dominant ‘C’ allele (C), an enzyme C is produced, which is responsible for colour formation in mice. With a dominant ‘P’ allele (P), enzyme P is produced which turns the mice coat black. When a homozygous recessive cc is present, it masks the effects of the ‘P’ allele. There is no enzyme C product and no colour formation in the mice coat. The mice thus remain albino regardless of the ‘P’ allele. It is said that the ‘cc’ genotype is epistatic to the 'p' and ‘P’.
Albino → enzyme C → brown → enzyme P → black
Dominant epistasis occurs when a pair of dominant alleles at one locus masks the expression of dominant and recessive alleles at another locus. This results in a 12:3:1 ratio. The pathway shown below is observed in the colour of summer squash fruits. In the enzymatic pathway controlling the colour of the squash fruits, a dominant ‘B’ allele (B) produces enzyme B that converts the precursor green pigment to yellow. With recessive ‘B’ genotype (bb), the fruit remains green. However, this only takes place when gene A is recessive (aa). A dominant ‘A’ allele (A) produces an enzyme A that degrades other pigments such that only white colour is expressed. Hence, a dominant ‘A’ allele masks the effects of the ‘B’ allele, and 'A_' is epistatic to 'b' and 'B'.
‘A’ allele (white) → degrades precursor green and yellow pigment. Precursor green – enzyme ‘B’→ yellow pigment
When recessive alleles at either locus mask the expression of any of the 2 dominant alleles, it results in complementary loci (or duplicate recessive, complementary) epistasis. It can also be described as the phenomenon where complementation between 2 genes are required for the production of a phenotype. This is seen in the flower colour of sweet pea plants3. Sweet pea flowers are either white or purple and this is determined by their genes. As shown by the pathway below, the production of anthocyanin, the purple pigment, requires 2 steps involving the conversion of precursor to chromogen and the conversion of chromogen to anthocyanin. Each of these steps requires an enzyme C and enzyme P. When the ‘C’ allele is dominant (C_), enzyme C is produced and conversion from the precursor (no colour) to chromogen (no colour) occurs. When the ‘P’ allele is also dominant, enzyme P is produced and chromogen is converted to anthocyanin. Hence, both the ‘C’ allele and the ‘P’ allele need to be dominant for purple colour formation in the flowers to occur(CP). A recessive ‘C’ allele (cc) or recessive ‘P’ allele (pp) will result in white flowers. Hence the dihybrid cross of the F2 generation results in a 9:7 ratio.
Precursor – (enzyme C) → chromogen – (enzyme P) → anthocyanin
Note: Besides the three types of epistasis mentioned above, there are other types of epistasis as well. The more common ones include **duplicate dominant genes** (15:1 ratio), **cumulative effect** (9:6:1 ratio), **inhibitory factor** (13:3 ratio), etc 4.
The program, thus far, has been able to predict the genotypic and phenotypic ratios of any number/combination of two-allele genes, as well as visualize these relationships through the use of a Punnett Square.
To adjust the program for different interactions of any two genes (or different types of two-gene epistasis), we provide logical frameworks in the form of code for each type of conversion from genotype to phenotype according to the given epistasis, and, subsequently, modify the code calculating phenotypic ratios.
Firstly, we provide the conversion methods to account for FOUR types of epistasis evident in the previous section: dominant, recessive, complementary loci (or duplicate recessive) and (bonus) duplicate dominant epistasis.
def convert_gen_to_phen_dom_epi(genotype, pairs): #assuming independence of the "pairs"
'''
Returns the phenotype of the given genotype, provided with the pairs of epistatic genes ("pairs"). "pairs" is a
tuple of smaller tuples, with each smaller tuple containing the a pair of epistatic genes (no overlap among the pairs).
Each smaller tuple is the format (x, y), where x is epistatic to y.
'''
phen = ""
pairings = {}
for pair in pairs: #for each pair in pairs,
pairings[pair[1].lower()] = pair[0].upper() #let the first gene (modifier) be the value and the second (modified) be the key
modifier = pairings.values()
modified = pairings.keys()
for i in range(0, len(genotype), 2): #looping through every gene in genotype
if genotype[i].isupper(): #if gene's first allele is dominant
if genotype[i].lower() in modified: #if gene is modified by another gene in "pairs"
for elem in modified:
if elem == genotype[i].lower():
if pairings[genotype[i].lower()] in genotype: #if the modifier gene has dominant allele in genotype
phen += f"{genotype[i]} not expressed - " #then the gene is NOT expressed
break
else:
continue
phen += f"{genotype[i]} expressed - " #if we cannot find the dominant allele of modifier gene in genotype, then gene is expressed
else:
phen += f"{genotype[i]} expressed - " #if gene is NOT modified by another gene, gene is expressed
else:
phen += f"{genotype[i].upper()} not expressed - " #if gene's first allele is recessive, gene is NOT expressed
return phen[:-3]
def convert_gen_to_phen_rec_epi(genotype, pairs): #assuming independence of "pairs"
'''
Returns the phenotype of the given genotype, provided with the pairs of epistatic genes ("pairs"). "pairs" is a
tuple of smaller tuples, with each smaller tuple containing the a pair of epistatic genes (no overlap among the pairs).
Each smaller tuple is the format (x, y), where x is epistatic to y.
'''
phen = ""
pairings = {}
for pair in pairs: #for each pair in pairs,
pairings[pair[1].upper()] = pair[0].lower() #let the first (modifier) gene be values, second (modified) be keys
modified = pairings.keys()
modifier = pairings.values()
for i in range(0, len(genotype), 2): #looping through every gene in genotype
if genotype[i].isupper(): #if gene's first allele is dominant
if genotype[i] in modified: #if gene is modified by another gene in "pairs"
for elem in modified:
if elem == genotype[i]:
if pairings[genotype[i]]*2 in genotype: #if modifier gene is homozygous recessive,
phen += f"{genotype[i]} not expressed - " #gene is NOT expressed
break
else:
continue
phen += f"{genotype[i]} expressed - " #if modifier gene is NOT homozygous recessive, gene is expressed
else:
phen += f"{genotype[i]} expressed - " #if gene is not modified, gene is expressed
else:
phen += f"{genotype[i].upper()} not expressed - " #if gene is homozygous recessive, it is always NOT expressed
return phen[:-3]
def convert_gen_to_phen_duprec_epi(genotype, pairs): #assuming independence of the "pairs"
'''
Returns the phenotype of the given genotype, provided with the pairs of epistatic genes ("pairs"). "pairs" is a
tuple of smaller tuples, with each smaller tuple containing the a pair of epistatic genes (no overlap among the pairs).
Each smaller tuple is the format (x, y), where x and y are duplicate recessive genes.
'''
phen = ""
firsts = []
seconds = []
for pair in pairs:
firsts += pair[0].lower() #create a list of first genes in each pair
seconds += pair[1].lower() #create a list of second genes in each pair
for i in range(0, len(genotype), 2): #looping through every gene in the genotype
if genotype[i].isupper(): #if gene's first allele is dominant,
if genotype[i].lower() in firsts: #if gene is in "firsts"
index = firsts.index(genotype[i].lower())
if seconds[index]*2 in genotype: #if corresponding gene in "seconds" is homozygous recessive in genotype
phen += f"{genotype[i]} not expressed - " #gene is NOT expressed
else:
if genotype[i].lower() in seconds: #if gene is in "seconds" (in another pair)
index = seconds.index(genotype[i].lower())
if firsts[index]*2 in genotype: #if corresponding gene in "firsts" is homozygous recessive in genotype
phen += f"{genotype[i]} not expressed - " #gene is NOT expressed
else:
phen += f"{genotype[i]} expressed - " #else, gene is expressed
else:
phen += f"{genotype[i]} expressed - " #if gene is not in "seconds", gene is expressed
elif genotype[i].lower() in seconds: #if gene is in "seconds"
index = seconds.index(genotype[i].lower())
if firsts[index]*2 in genotype: #if corresponding gene in "firsts" is homozygous recessive in genotype
phen += f"{genotype[i]} not expressed - " #gene is NOT expressed
else:
phen += f"{genotype[i]} expressed - " #else, gene is expressed
else:
phen += f"{genotype[i]} expressed - " #if gene is not in "seconds", gene is expressed
else:
phen += f"{genotype[i].upper()} not expressed - " #if gene is homozygous recessive, it is always NOT expressed
return phen[:-3]
def convert_gen_to_phen_dupdom_epi(genotype, pairs): #assuming independence of the "pairs"
'''
Returns the phenotype of the given genotype, provided with the pairs of epistatic genes ("pairs"). "pairs" is a
tuple of smaller tuples, with each smaller tuple containing the a pair of epistatic genes (no overlap among the pairs).
Each smaller tuple is the format (x, y), where x and y are duplicate dominant genes.
'''
phen = ""
firsts = []
seconds = []
for pair in pairs:
firsts += pair[0].upper() #same as duprec
seconds += pair[1].upper() #same as duprec
for i in range(0, len(genotype), 2): #same as duprec
if genotype[i].isupper():
phen += f"{genotype[i]} expressed - " #if gene has a dominant allele, it is always expresed
else: #else if gene is homozygous recessive
if genotype[i].upper() in firsts: #if gene is in "firsts"
index = firsts.index(genotype[i].upper())
if seconds[index] in genotype: #if the corresponding gene in "seconds" has at least 1 dominant allele
phen += f"{genotype[i].upper()} expressed - " #gene is expressed
else:
phen += f"{genotype[i].upper()} not expressed - " #else, gene is NOT expressed
elif genotype[i].upper() in seconds: #same as the case above for gene in "firsts"
index = seconds.index(genotype[i].upper())
if firsts[index] in genotype:
phen += f"{genotype[i].upper()} expressed - "
else:
phen += f"{genotype[i].upper()} not expressed - "
else: #if gene is not in any pairs and gene is homozygous recessive
phen += f"{genotype[i].upper()} not expressed - " #gene is NOT expressed
return phen[:-3]
Next, we want to create a general tool that can convert genotype to phenotype with any of these four epistatic relationships. This function can easily be expanded to include more types of epistasis.
def convert_gen_to_phen_with_epi(geno, epis, pairs = None):
'''
Return the phenotype of any genotype with the given epistatic pattern and pairs of epistatic genes.
'''
if epis == "d":
pheno = convert_gen_to_phen_dom_epi(geno, pairs)
elif epis == "r":
pheno = convert_gen_to_phen_rec_epi(geno, pairs)
elif epis == "dr" or epis == "cl":
pheno = convert_gen_to_phen_duprec_epi(geno, pairs)
elif epis == "dd":
pheno = convert_gen_to_phen_dupdom_epi(geno, pairs)
elif epis == "0":
pheno = convert_gen_to_phen(geno)
return pheno
## Sample implementation ##
###########################
geno = "AaBbCCddEe"
epis = "0" #no epistasis
pairs = ()
print("## No epistasis ##")
print(convert_gen_to_phen_with_epi(geno, epis, pairs))
epis = "d" #dominant epistasis
pairs = (("a", "b"), ("d", "e"))
print("## Dominant epistasis ##")
print(convert_gen_to_phen_with_epi(geno, epis, pairs))
epis = "r" #recessive epistasis
pairs = (("a", "b"), ("d", "e"))
print("## Recessive epistasis ##")
print(convert_gen_to_phen_with_epi(geno, epis, pairs))
epis = "dr" #complementary loci epistasis
pairs = (("a", "b"), ("d", "e"))
print("## Complimentary loci/duplicate recessive epistasis ##")
print(convert_gen_to_phen_with_epi(geno, epis, pairs))
epis = "dd" #duplicate dominant epistasis
pairs = (("a", "b"), ("d", "e"))
print("## Duplicate dominant epistasis ##")
print(convert_gen_to_phen_with_epi(geno, epis, pairs))
## No epistasis ## A expressed - B expressed - C expressed - D not expressed - E expressed ## Dominant epistasis ## A expressed - B not expressed - C expressed - D not expressed - E expressed ## Recessive epistasis ## A expressed - B expressed - C expressed - D not expressed - E not expressed ## Complimentary loci/duplicate recessive epistasis ## A expressed - B expressed - C expressed - D not expressed - E not expressed ## Duplicate dominant epistasis ## A expressed - B expressed - C expressed - D expressed - E expressed
Now that we have the tool to convert any type of genotype to its phenotype given a epistatic relationship between any two genes, we can modify the phen_ratio_multiple_genes and cross_table functions above to accommodate for the presence of epistasis.
First, we modify phen_ratio_multiple_genes.
def phen_ratio_multiple_genes_with_epi(parent1, parent2, epis, pairs = None): #include 2 new parameters, "epis" and "pairs"
'''
Returns a dictionary representing the phenotypic ratio of the cross. The keys of the dictionary are the possible
phenotypes and the values are the ratios. "epis" refers to the type of epistasis present and "pairs" refers to the
pairs of epistatic genes involved in the epistasis.
'''
n = int(len(parent1)/2)
gen_ratio = gen_ratio_multiple_gene(parent1, parent2)
genotypes = gen_ratio.keys()
phenotypes = {}
for geno in genotypes:
pheno = convert_gen_to_phen_with_epi(geno, epis, pairs) #convert_gen_to_phen_with_epi replaces the previous convert_gen_to_phen
phenotypes[pheno] = phenotypes.get(pheno,0) + int(gen_ratio[geno]*4**n) #Attach integer values to each phenotypes, the get values allowed no found phenotypes to have a default value of 0
sorted_phen = dict(sorted(phenotypes.items(), key = lambda x: x[1], reverse = True))
return sorted_phen #the rest is the same
## Sample implementation ##
###########################
parent1 = "AaBbCc"
parent2 = "AaBbCc"
epis = "0" #no epistasis
pairs = ()
print("## No epistasis ##")
display_phen(phen_ratio_multiple_genes_with_epi(parent1, parent2, epis, pairs))
## No epistasis ## F1 Phenotype Ratio A expressed - B expressed - C expressed 27 A expressed - B expressed - C not expressed 9 A expressed - B not expressed - C expressed 9 A not expressed - B expressed - C expressed 9 A expressed - B not expressed - C not expressed 3 A not expressed - B expressed - C not expressed 3 A not expressed - B not expressed - C expressed 3 A not expressed - B not expressed - C not expressed 1
epis = "d" #dominant epistasis
pairs = (("a", "b"),)
print("## Dominant epistasis ##")
display_phen(phen_ratio_multiple_genes_with_epi(parent1, parent2, epis, pairs))
## Dominant epistasis ## F1 Phenotype Ratio A expressed - B not expressed - C expressed 36 A expressed - B not expressed - C not expressed 12 A not expressed - B expressed - C expressed 9 A not expressed - B expressed - C not expressed 3 A not expressed - B not expressed - C expressed 3 A not expressed - B not expressed - C not expressed 1
epis = "r" #recessive epistasis
pairs = (("a", "b"),)
print("## Recessive epistasis ##")
display_phen(phen_ratio_multiple_genes_with_epi(parent1, parent2, epis, pairs))
## Recessive epistasis ## F1 Phenotype Ratio A expressed - B expressed - C expressed 27 A not expressed - B not expressed - C expressed 12 A expressed - B expressed - C not expressed 9 A expressed - B not expressed - C expressed 9 A not expressed - B not expressed - C not expressed 4 A expressed - B not expressed - C not expressed 3
epis = "dr" #complementary loci epistasis
pairs = (("a", "b"),)
print("## Complimentary loci/duplicate recessive epistasis ##")
display_phen(phen_ratio_multiple_genes_with_epi(parent1, parent2, epis, pairs))
## Complimentary loci/duplicate recessive epistasis ## F1 Phenotype Ratio A expressed - B expressed - C expressed 27 A not expressed - B not expressed - C expressed 21 A expressed - B expressed - C not expressed 9 A not expressed - B not expressed - C not expressed 7
epis = "dd" #duplicate dominant epistasis
pairs = (("a", "b"),)
print("## Duplicate dominant epistasis ##")
display_phen(phen_ratio_multiple_genes_with_epi(parent1, parent2, epis, pairs))
## Duplicate dominant epistasis ## F1 Phenotype Ratio A expressed - B expressed - C expressed 45 A expressed - B expressed - C not expressed 15 A not expressed - B not expressed - C expressed 3 A not expressed - B not expressed - C not expressed 1
Finally, we modify the cross_table function to include the effect of epistasis.
def cross_table_with_epi(parent1, parent2, epis, pairs):
'''
Returns a Punnett Square representing the odds of each phenotype and legends that include name of phenotype, colour
and ratio.
'''
parent1_gametes = list(itertools.combinations(parent1, int(len(parent1) / 2))) #creates all possible combinations of n letters in the genotype of parent1, where n is the number of genes in genotype
parent1_gametes = delete_same_letters(parent1_gametes) #delete those combinations with repeated letters, only those with distinct letters remain -> these are the possible combinations to form gametes
parent2_gametes = list(itertools.combinations(parent2, int(len(parent2) / 2))) #same as above
parent2_gametes = delete_same_letters(parent2_gametes) #same as above
for i in range(len(parent1_gametes)):
parent1_gametes[i] = ''.join(parent1_gametes[i]) + f' ({i + 1})' #join the letters in each combination together to form gametes, with the index appended at the back
for i in range(len(parent2_gametes)):
parent2_gametes[i] = ''.join(parent2_gametes[i]) + f' ({i + 1})' #same as above
square = pd.DataFrame(columns=pd.MultiIndex.from_product([['Parent 2 Gametes'], #create a dataframe that represents the Punnett Square
parent2_gametes]),
index=pd.MultiIndex.from_product([['Parent 1 Gametes'],
parent1_gametes]))
legends = {"phenotypes" : [], "colours" : [], "ratios": [] } #create a dictionary of legends
phenotypes = list(phen_ratio_multiple_genes_with_epi(parent1, parent2, epis, pairs).keys())
phenotype_with_ratios = phen_ratio_multiple_genes_with_epi(parent1, parent2, epis, pairs)
genotypes = [] #see yello box below for reasons why we need to create a new list of genotypes from gametes
for row in range(len(parent1_gametes)):
for column in range(len(parent2_gametes)):
'''
This part attempts to join the gametes from each parent together to form unordered genotypes.
The reformed genotypes order and arrange the letters in the genotypes so that the capital letters and in front
'''
genotype= ''.join([''.join(i) for i in zip(parent1_gametes[row][:len(parent1_gametes[0]) - 4],parent2_gametes[column][:len(parent2_gametes[0]) - 4])])
reformed_genotype=''
for i in range(0,len(genotype),2):
reformed_genotype+=genotype[i:i+2].replace(genotype[i:i+2],''.join(sorted(genotype[i:i+2])))
square.iloc[row, column] = reformed_genotype #add the reformed genotypes into the dataframe
genotypes.append(reformed_genotype)
for geno in genotypes: #for each genotype in genotypes
phenotype = convert_gen_to_phen_with_epi(geno, epis, pairs)
for i, phen in enumerate(phenotypes):
if phen == phenotype: #go through the whole list of possibe phenotypes. When an element matches the phenotype we are looking more, assign colour index for phenotype to the index of element
colour_index = i
colour = ALL_COLOURS[colour_index]
if phenotype not in legends["phenotypes"]: #add to legends of phenotype and its colour is not already inside
legends["phenotypes"].append(phenotype)
legends["colours"].append(colour)
for pheno in legends["phenotypes"]:
legends["ratios"].append(phenotype_with_ratios[pheno]) #add to legends of each phenotype's ratio
legends_table = pd.DataFrame.from_dict(legends).set_index("phenotypes").sort_values(by = "ratios", ascending = False)
def cell_colour(genotype): #this function returns the background colour corresponding to each genotype
phenotype = convert_gen_to_phen_with_epi(genotype, epis, pairs)
index = legends["phenotypes"].index(phenotype) #using the index of "phenotypes" in legends dict
colour = ALL_COLOURS[index] #to ge color from the color palette
return f"background-color: {colour}"
def legend_colour(colour): #this function returns the background colour corresponding to each genotype
return f"background-color: {colour}"
square = square.style.applymap(cell_colour) #apply background colours to each genotype in square
legends_table = legends_table.style.applymap(legend_colour, subset = "colours") #apply background colours to legends
return square, legends_table
## Sample implementation ##
###########################
parent3 = "AaBbCc"
parent4 = "AaBbCc"
epis = "0" #no epistasis
pairs = ()
print("## No epistasis ##")
display_cross(cross_table_with_epi(parent3, parent4, epis, pairs))
## No epistasis ##
| Parent 2 Gametes | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ABC (1) | ABc (2) | AbC (3) | Abc (4) | aBC (5) | aBc (6) | abC (7) | abc (8) | ||
| Parent 1 Gametes | ABC (1) | AABBCC | AABBCc | AABbCC | AABbCc | AaBBCC | AaBBCc | AaBbCC | AaBbCc |
| ABc (2) | AABBCc | AABBcc | AABbCc | AABbcc | AaBBCc | AaBBcc | AaBbCc | AaBbcc | |
| AbC (3) | AABbCC | AABbCc | AAbbCC | AAbbCc | AaBbCC | AaBbCc | AabbCC | AabbCc | |
| Abc (4) | AABbCc | AABbcc | AAbbCc | AAbbcc | AaBbCc | AaBbcc | AabbCc | Aabbcc | |
| aBC (5) | AaBBCC | AaBBCc | AaBbCC | AaBbCc | aaBBCC | aaBBCc | aaBbCC | aaBbCc | |
| aBc (6) | AaBBCc | AaBBcc | AaBbCc | AaBbcc | aaBBCc | aaBBcc | aaBbCc | aaBbcc | |
| abC (7) | AaBbCC | AaBbCc | AabbCC | AabbCc | aaBbCC | aaBbCc | aabbCC | aabbCc | |
| abc (8) | AaBbCc | AaBbcc | AabbCc | Aabbcc | aaBbCc | aaBbcc | aabbCc | aabbcc | |
display_legends(cross_table_with_epi(parent3, parent4, epis, pairs))
| colours | ratios | |
|---|---|---|
| phenotypes | ||
| A expressed - B expressed - C expressed | #acc2d9 | 27 |
| A expressed - B expressed - C not expressed | #56ae57 | 9 |
| A expressed - B not expressed - C expressed | #b2996e | 9 |
| A not expressed - B expressed - C expressed | #a8ff04 | 9 |
| A expressed - B not expressed - C not expressed | #69d84f | 3 |
| A not expressed - B expressed - C not expressed | #894585 | 3 |
| A not expressed - B not expressed - C expressed | #70b23f | 3 |
| A not expressed - B not expressed - C not expressed | #d4ffff | 1 |
epis = "d" #dominant epistasis
pairs = (("a", "b"),)
print("## Dominant epistasis ##")
display_cross(cross_table_with_epi(parent3, parent4, epis, pairs))
## Dominant epistasis ##
| Parent 2 Gametes | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ABC (1) | ABc (2) | AbC (3) | Abc (4) | aBC (5) | aBc (6) | abC (7) | abc (8) | ||
| Parent 1 Gametes | ABC (1) | AABBCC | AABBCc | AABbCC | AABbCc | AaBBCC | AaBBCc | AaBbCC | AaBbCc |
| ABc (2) | AABBCc | AABBcc | AABbCc | AABbcc | AaBBCc | AaBBcc | AaBbCc | AaBbcc | |
| AbC (3) | AABbCC | AABbCc | AAbbCC | AAbbCc | AaBbCC | AaBbCc | AabbCC | AabbCc | |
| Abc (4) | AABbCc | AABbcc | AAbbCc | AAbbcc | AaBbCc | AaBbcc | AabbCc | Aabbcc | |
| aBC (5) | AaBBCC | AaBBCc | AaBbCC | AaBbCc | aaBBCC | aaBBCc | aaBbCC | aaBbCc | |
| aBc (6) | AaBBCc | AaBBcc | AaBbCc | AaBbcc | aaBBCc | aaBBcc | aaBbCc | aaBbcc | |
| abC (7) | AaBbCC | AaBbCc | AabbCC | AabbCc | aaBbCC | aaBbCc | aabbCC | aabbCc | |
| abc (8) | AaBbCc | AaBbcc | AabbCc | Aabbcc | aaBbCc | aaBbcc | aabbCc | aabbcc | |
display_legends(cross_table_with_epi(parent3, parent4, epis, pairs))
| colours | ratios | |
|---|---|---|
| phenotypes | ||
| A expressed - B not expressed - C expressed | #acc2d9 | 36 |
| A expressed - B not expressed - C not expressed | #56ae57 | 12 |
| A not expressed - B expressed - C expressed | #b2996e | 9 |
| A not expressed - B expressed - C not expressed | #a8ff04 | 3 |
| A not expressed - B not expressed - C expressed | #69d84f | 3 |
| A not expressed - B not expressed - C not expressed | #894585 | 1 |
epis = "r" #recessive epistasis
pairs = (("a", "b"),)
print("## Recessive epistasis ##")
display_cross(cross_table_with_epi(parent3, parent4, epis, pairs))
## Recessive epistasis ##
| Parent 2 Gametes | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ABC (1) | ABc (2) | AbC (3) | Abc (4) | aBC (5) | aBc (6) | abC (7) | abc (8) | ||
| Parent 1 Gametes | ABC (1) | AABBCC | AABBCc | AABbCC | AABbCc | AaBBCC | AaBBCc | AaBbCC | AaBbCc |
| ABc (2) | AABBCc | AABBcc | AABbCc | AABbcc | AaBBCc | AaBBcc | AaBbCc | AaBbcc | |
| AbC (3) | AABbCC | AABbCc | AAbbCC | AAbbCc | AaBbCC | AaBbCc | AabbCC | AabbCc | |
| Abc (4) | AABbCc | AABbcc | AAbbCc | AAbbcc | AaBbCc | AaBbcc | AabbCc | Aabbcc | |
| aBC (5) | AaBBCC | AaBBCc | AaBbCC | AaBbCc | aaBBCC | aaBBCc | aaBbCC | aaBbCc | |
| aBc (6) | AaBBCc | AaBBcc | AaBbCc | AaBbcc | aaBBCc | aaBBcc | aaBbCc | aaBbcc | |
| abC (7) | AaBbCC | AaBbCc | AabbCC | AabbCc | aaBbCC | aaBbCc | aabbCC | aabbCc | |
| abc (8) | AaBbCc | AaBbcc | AabbCc | Aabbcc | aaBbCc | aaBbcc | aabbCc | aabbcc | |
display_legends(cross_table_with_epi(parent3, parent4, epis, pairs))
| colours | ratios | |
|---|---|---|
| phenotypes | ||
| A expressed - B expressed - C expressed | #acc2d9 | 27 |
| A not expressed - B not expressed - C expressed | #56ae57 | 12 |
| A expressed - B expressed - C not expressed | #b2996e | 9 |
| A expressed - B not expressed - C expressed | #a8ff04 | 9 |
| A not expressed - B not expressed - C not expressed | #69d84f | 4 |
| A expressed - B not expressed - C not expressed | #894585 | 3 |
epis = "dr" #complementary loci epistasis
pairs = (("a", "b"),)
print("## Complimentary loci/duplicate recessive epistasis ##")
display_cross(cross_table_with_epi(parent3, parent4, epis, pairs))
## Complimentary loci/duplicate recessive epistasis ##
| Parent 2 Gametes | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ABC (1) | ABc (2) | AbC (3) | Abc (4) | aBC (5) | aBc (6) | abC (7) | abc (8) | ||
| Parent 1 Gametes | ABC (1) | AABBCC | AABBCc | AABbCC | AABbCc | AaBBCC | AaBBCc | AaBbCC | AaBbCc |
| ABc (2) | AABBCc | AABBcc | AABbCc | AABbcc | AaBBCc | AaBBcc | AaBbCc | AaBbcc | |
| AbC (3) | AABbCC | AABbCc | AAbbCC | AAbbCc | AaBbCC | AaBbCc | AabbCC | AabbCc | |
| Abc (4) | AABbCc | AABbcc | AAbbCc | AAbbcc | AaBbCc | AaBbcc | AabbCc | Aabbcc | |
| aBC (5) | AaBBCC | AaBBCc | AaBbCC | AaBbCc | aaBBCC | aaBBCc | aaBbCC | aaBbCc | |
| aBc (6) | AaBBCc | AaBBcc | AaBbCc | AaBbcc | aaBBCc | aaBBcc | aaBbCc | aaBbcc | |
| abC (7) | AaBbCC | AaBbCc | AabbCC | AabbCc | aaBbCC | aaBbCc | aabbCC | aabbCc | |
| abc (8) | AaBbCc | AaBbcc | AabbCc | Aabbcc | aaBbCc | aaBbcc | aabbCc | aabbcc | |
display_legends(cross_table_with_epi(parent3, parent4, epis, pairs))
| colours | ratios | |
|---|---|---|
| phenotypes | ||
| A expressed - B expressed - C expressed | #acc2d9 | 27 |
| A not expressed - B not expressed - C expressed | #56ae57 | 21 |
| A expressed - B expressed - C not expressed | #b2996e | 9 |
| A not expressed - B not expressed - C not expressed | #a8ff04 | 7 |
epis = "dd" #duplicate dominant epistasis
pairs = (("a", "b"),)
print("## Duplicate dominant epistasis ##")
display_cross(cross_table_with_epi(parent3, parent4, epis, pairs))
## Duplicate dominant epistasis ##
| Parent 2 Gametes | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ABC (1) | ABc (2) | AbC (3) | Abc (4) | aBC (5) | aBc (6) | abC (7) | abc (8) | ||
| Parent 1 Gametes | ABC (1) | AABBCC | AABBCc | AABbCC | AABbCc | AaBBCC | AaBBCc | AaBbCC | AaBbCc |
| ABc (2) | AABBCc | AABBcc | AABbCc | AABbcc | AaBBCc | AaBBcc | AaBbCc | AaBbcc | |
| AbC (3) | AABbCC | AABbCc | AAbbCC | AAbbCc | AaBbCC | AaBbCc | AabbCC | AabbCc | |
| Abc (4) | AABbCc | AABbcc | AAbbCc | AAbbcc | AaBbCc | AaBbcc | AabbCc | Aabbcc | |
| aBC (5) | AaBBCC | AaBBCc | AaBbCC | AaBbCc | aaBBCC | aaBBCc | aaBbCC | aaBbCc | |
| aBc (6) | AaBBCc | AaBBcc | AaBbCc | AaBbcc | aaBBCc | aaBBcc | aaBbCc | aaBbcc | |
| abC (7) | AaBbCC | AaBbCc | AabbCC | AabbCc | aaBbCC | aaBbCc | aabbCC | aabbCc | |
| abc (8) | AaBbCc | AaBbcc | AabbCc | Aabbcc | aaBbCc | aaBbcc | aabbCc | aabbcc | |
display_legends(cross_table_with_epi(parent3, parent4, epis, pairs))
| colours | ratios | |
|---|---|---|
| phenotypes | ||
| A expressed - B expressed - C expressed | #acc2d9 | 45 |
| A expressed - B expressed - C not expressed | #56ae57 | 15 |
| A not expressed - B not expressed - C expressed | #b2996e | 3 |
| A not expressed - B not expressed - C not expressed | #a8ff04 | 1 |
Various research papers deal with epistasis governing various phenotypes such as seed colour, disease, etc. We have sourced for and narrowed down four research papers which have successfully modeled epistasis in their subjects of study.
Paper 1: Epistasis in genes governing seed colour in trifolium alexandrinum5
Seeds of the Egyptian clover plant come in either yellow or black colour. In an interspecific cross with 2 different species of the plant, the F1 generation was black while the F2 generation was a mix of both yellow and black. Yellow seeds were found to be true breeding while black seeds were either true breeding or produced a 9:7 ratio which indicates complementary loci epistasis.
Paper 2: Epistasis in downy mildew resistance in cucumbers6
Downy mildew is a disease caused by fungus in cucumbers. Some cucumbers are immune to this disease owing to resistance genes they have. To investigate the interactions surrounding these genes, researchers crossed 2 lines of the cucumber species, one resistant and one susceptible. The F1 generation was self-pollinated to form the F2 generation. The F2 generation was then tested for resistance and the phenotype appeared in a 9:7 ratio, once again indicating complementary epistasis.
## Both of the papers above can be modelled using complementary loci epistasis model ##
'''
Paper 1 - Let the genes be Y and B. When either Y or B is recessive (yy__ or __bb), both genes will not be expressed
and a yellow seed is formed. In other cases, a black seed is formed.
'''
display_phen(phen_ratio_multiple_genes_with_epi("YyBb", "YyBb", "cl", pairs = (("y", "b"),)))
F1 Phenotype Ratio Y expressed - B expressed 9 Y not expressed - B not expressed 7
'''
Paper 2 - Let A and B be the genes for resistance to Downy Mildew. When either A or B is recessive (aa__ or __bb), both
genes will not be expressed, and resistance is not observed. In other cases, resistance is observed.
'''
display_phen(phen_ratio_multiple_genes_with_epi("AaBb", "AaBb", "cl", pairs = (("a", "b"),)))
F1 Phenotype Ratio A expressed - B expressed 9 A not expressed - B not expressed 7
Paper 3: Epistasis in barley awn genes7
Awns, a part of the barley plant, comes in various phenotypes such as hooded, normal, leafy, long, etc. These phenotypes are controlled by the awnness loci that interact with each other. Crosses between different phenotypes resulted in various epistatic ratios. These ratios are summarised in the table below:
## Complementary epistasis ##
display_phen(phen_ratio_multiple_genes_with_epi("NnHh", "NnHh", "cl", pairs = (("n", "h"),)))
F1 Phenotype Ratio N expressed - H expressed 9 N not expressed - H not expressed 7
## Dominant duplicate epistasis ##
## Let Lel1 be N and Lel2 be M ##
display_phen(phen_ratio_multiple_genes_with_epi("NnMm", "NnMm", "dd", pairs = (("n", "m"),)))
F1 Phenotype Ratio N expressed - M expressed 15 N not expressed - M not expressed 1
## Recessive epistasis ##
## Let Lks be L and Kap1 be K ##
display_phen(phen_ratio_multiple_genes_with_epi("LlKk", "LlKk", "r", pairs = (("l", "k"),)))
F1 Phenotype Ratio L expressed - K expressed 9 L not expressed - K not expressed 4 L expressed - K not expressed 3
## Dominant epistasis ##
## Let Lsa1 be L and Kap1 be K ##
display_phen(phen_ratio_multiple_genes_with_epi("LlKk", "LlKk", "d", pairs = (("l", "k"),)))
F1 Phenotype Ratio L expressed - K not expressed 12 L not expressed - K expressed 3 L not expressed - K not expressed 1
Paper 4: Epistasis in leprosy8
Biomolecules released in the tissue are important for maintaining homeostasis as well as tissue integrity. When the equilibrium of these molecules are disrupted by pathogens such as M. leprae, it can lead to tissue damage and result in diseases such as leprosy. To prevent this, the immune system responds to the pathogen by varying the levels of biomolecules and bringing it back to equilibrium. The production of these biomolecules are controlled by gene expression between different molecules in the response pathway. Whether the levels of the molecules are high or low depends on dominant and recessive characteristics. To explore this further, researchers tested the interaction between different molecules in this pathway. The result of their experiment was varying epistatic ratios summarised in the table below:
## For simplicity reasons, the genes are labelled as "A" and "B".
## Complementary epistasis ##
print(phen_ratio_multiple_genes_with_epi("AaBb", "AaBb", "cl", pairs = (("a", "b"),)))
## Dominant duplicate epistasis ##
print(phen_ratio_multiple_genes_with_epi("AaBb", "AaBb", "dd", pairs = (("a", "b"),)))
## Recessive epistasis ##
print(phen_ratio_multiple_genes_with_epi("AaBb", "AaBb", "r", pairs = (("a", "b"),)))
## Recessive epistasis ##
print(phen_ratio_multiple_genes_with_epi("AaBb", "AaBb", "d", pairs = (("a", "b"),)))
{'A expressed - B expressed': 9, 'A not expressed - B not expressed': 7}
{'A expressed - B expressed': 15, 'A not expressed - B not expressed': 1}
{'A expressed - B expressed': 9, 'A not expressed - B not expressed': 4, 'A expressed - B not expressed': 3}
{'A expressed - B not expressed': 12, 'A not expressed - B expressed': 3, 'A not expressed - B not expressed': 1}
Hence, we have confidence that our program is highly accurate at predicting phenotypic ratios for these types of epistasis.
Recently, epistasis involving more than two genes is of vast interest as many scientists attempt to examine the relationship among different genes through the use of epistasis analysis. By crossing parents of different phenotypes to produce offspring, when a particular phenotypic ratio is obtained, the researchers will hypothesise about the different possible pathways that the genes are part of and the sequence of such pathways. Thus, our model is designed to predict the phenotypic ratio of a cross given the hypothesis of a possible order and interaction of genes.
It should be noted that the phenotypic ratios that these crosses involving interactions from multiple genes will be vastly different from the phenotypic ratios that we have seen thus far from two-gene epistasis. This is due to the complexity of the pathways that genes are involved in, hence, giving rise to a variety of convoluted relationships that simple models of two-gene interactions cannot model.
The phen_ratio_multiple_genes_with_epi (with some modifications) function is used to calculate the phenotypic ratio, but since the conversion from genotype to phenotype is pathway-specific, there is a need to design the function for each type of conversion. However, this is rather simple and can be done by the user after coming up with a hypothetical pathway that the genes are involved in.*
def phen_ratio_multiple_genes_multiple_epi(parent1, parent2, conversion): #include a new parameter
'''
Returns a dictionary representing the phenotypic ratio of the cross. The keys of the dictionary are the possible
phenotypes and the values are the ratios. Conversion is the type of geno-to-phenotype conversion that the scientist
predicts the genes would have.
'''
n = int(len(parent1)/2)
gen_ratio = gen_ratio_multiple_gene(parent1, parent2)
genotypes = gen_ratio.keys()
phenotypes = {}
for geno in genotypes:
pheno = conversion(geno) #convert_gen_to_phen_with_epi is replaced by a more case-specific conversion function
phenotypes[pheno] = phenotypes.get(pheno,0) + int(gen_ratio[geno]*4**n) #Attach integer values to each phenotypes, the get values allowed no found phenotypes to have a default value of 0
sorted_phen = dict(sorted(phenotypes.items(), key = lambda x: x[1], reverse = True))
return sorted_phen #the rest is the same
Next, we refer to some examples from research papers to illustrate the predictive power of the program. Given the following pathways, phenotypic ratios can be predicted when any combination of two parents are crossed. Some possible conversions:
Let C be the allele for curly, G be the allele for grinch and B be the allele for bubble head. In the paper, scientists predict a recessive epistasis, with c being epistasis to both G and B. This means that whenever a cc genotype is present, both G and B will not be expressed. When heterozygous parents are crossed, the paper predicts a ratio of 27:16:9:9:3.
def mutant_gen_to_phen(genotype):
pheno = ""
if "cc" in genotype:
pheno = "curly - not grinch - not bubblehead"
return pheno
else:
pheno += "not curly - "
if "G" in genotype:
pheno += "grinch - "
else:
pheno += "not grinch - "
if "B" in genotype:
pheno += "bubblehead - "
else:
pheno += "not bubblehead - "
return pheno[:-3]
## Sample implementation ##
###########################
print("## Mutants in inbred X. tropicalis, expected ratio of 27:16:9:9:3 ##")
parent5 = "CcGgBb"
parent6 = "CcGgBb"
conversion = mutant_gen_to_phen
phen_ratio = phen_ratio_multiple_genes_multiple_epi(parent5, parent6, conversion)
display_phen(phen_ratio)
## Mutants in inbred X. tropicalis, expected ratio of 27:16:9:9:3 ## F1 Phenotype Ratio not curly - grinch - bubblehead 27 curly - not grinch - not bubblehead 16 not curly - grinch - not bubblehead 9 not curly - not grinch - bubblehead 9 not curly - not grinch - not bubblehead 3
Let lin-26-coding allele be A, mutant allele be a.
Let lin-39-coding allele be B, mutant allele be b.
Let let-23-coding allele be C, mutant allele be c.
def vulva1_gen_to_phen(genotype):
for i in genotype:
if i.isupper():
return "vulva"
return "vulvaless"
## Sample implementation ##
###########################
print("## Vulva formation (Pathway 1) ##")
parent7 = "AaBbCc"
parent8 = "aabbcc"
conversion = vulva1_gen_to_phen
phen_ratio = phen_ratio_multiple_genes_multiple_epi(parent7, parent8, conversion)
display_phen(phen_ratio)
## Vulva formation (Pathway 1) ## F1 Phenotype Ratio vulva 56 vulvaless 8
Let lin-15-coding allele be A, mutant allele be a.
Let let-23-coding allele be B, mutant allele be b.
Let lin-1-coding allele be C, mutant allele be c.
def vulva2_gen_to_phen(genotype):
if "A" in genotype:
for i in genotype:
if i == "B":
genotype = genotype.replace("B", "b")
if "B" in genotype:
for i in genotype:
if i == "C":
genotype = genotype.replace("C", "c")
if "C" in genotype:
return "vulvaless"
return "vulva"
## Sample implementation ##
###########################
print("## Vulva formation (Pathway 2) ##")
conversion = vulva2_gen_to_phen
phen_ratio = phen_ratio_multiple_genes_multiple_epi(parent7, parent8, conversion)
display_phen(phen_ratio)
## Vulva formation (Pathway 2) ## F1 Phenotype Ratio vulva 40 vulvaless 24
Let ced-9-coding allele be A, mutant allele be a.
Let ced-3-coding allele be B, mutant allele be b.
Let ced-1-coding allele be C, mutant allele be c.
def engulfment_gen_to_phen(genotype):
if "A" in genotype:
return "not engulfed"
if "B" in genotype:
if "C" in genotype:
return "engulfed"
return "not engulfed"
## Sample implementation ##
###########################
print("## Engulfment of dead cells ##")
parent9 = "AaBbCC"
parent10 = "AaBbcc"
conversion = engulfment_gen_to_phen
phen_ratio = phen_ratio_multiple_genes_multiple_epi(parent9, parent10, conversion)
display_phen(phen_ratio)
## Engulfment of dead cells ## F1 Phenotype Ratio not engulfed 52 engulfed 12
Let alpha-factor-coding allele be A, mutant allele be a.
Let cdc-28-coding allele be B, mutant allele be b.
Let cdc-4-coding allele be C, mutant allele be c.
Let cdc-7-coding allele be D, mutant allele be d.
def dna_synthesis_gen_to_phen(genotype):
if "A" and "B" in genotype:
if "C" in genotype:
if "D" in genotype:
return "synthesis"
return "non-synthesis"
## Sample implementation ##
###########################
print("## DNA synthesis ##")
parent11 = "AabbCCDd"
parent12 = "AaBbccDd"
conversion = dna_synthesis_gen_to_phen
phen_ratio = phen_ratio_multiple_genes_multiple_epi(parent11, parent12, conversion)
display_phen(phen_ratio)
## DNA synthesis ## F1 Phenotype Ratio non-synthesis 160 synthesis 96
Our program has been shown to reliably model many complex epistatic relationships between two or more genes, ranging from dominant, recessive, complimentary, duplicate, etc. In addition, since our program is highly customizable to the point that users can create their own conversion function to fit their own defined relationship between their genes of interest, it serves as a powerful tool in predicting and mapping the phenotypic ratios for any combination of alleles that are crossed.
Moreover, our code provides a way for scientists as well as science students to visualise the phenotypic ratio of any number of genes when crossed together in the form of a table or a Punnett Square. The display_phen function allows the user to observe the phenotypic ratio in descending order, which could potentially aid them in explaining and predicting trends in their own experiments. The tables can also be included in reports for reference. Besides, the use of colours in the Punnett square could aid in the identification of genotypes and the visualisation of the proportions of each phenotype in the offspring. A legends table can also be displayed in case there is difficulty in ascertaining which colour represents which phenotype.
Perhaps the most powerful feature of our program is the fact that scientists can use it for ANY NUMBER OF GENES despite the fact that as the number of genes increases, the runtime of the program will also increase significantly. However, the fact that the program can reliably predict the phenotypic ratio can save time that is otherwise spent in conducting further experiments to verify their numbers. Thus, the program can potentially help a lot of geneticists minimize their costs and use of resources. Furthermore, there are many online "phenotypic ratio calculators" available for public use, but the majority of them could only cover non-epistatic and three gene crosses at most. Hence, our program serve as a starting point to generate a more reliable and dynamic machine to predict this ratio (see Extensions to understand why this ratio could be very important in academia).
However, there are future directions that could be taken to improve our program. Overall, our code may be inefficient in dealing with larger data sets. For instance, if a genotype consisting of 100 genes was fed into the program, the running time may increase exponentially. In addition, because the Punnett Square is very generalisable (able to handle as many genes/alleles as desired), the bigger the genotype of the parents, the more difficult it is to ascertain trends and patterns from the Punnett Square. The same problem may occur when creating the phenotypic table using display_phen. Therefore, more streamlining and simplification of the program can be done for handling of a larger set of data.
Another possible point for improvement can be the beginner-friendliness of the program. Since the program allows scientists to specify the relationships of their genes of interest and create their own functions to convert genotype to phenotype, it presumes the fact that the scientists handling the data have at least fundamental knowledge of coding. Since there is still a gap between science and informatics, specialists are required to understand both the science and computation behind genetics to operate the program. This may, thus, reduce the desirability of the program to researchers who have little to no skill in coding. This problem is especially pertinent when the dataset is large and the inter-gene relationships are convoluted.
Finally, in the generalized functions of the program (for example, the convert_gen_to_phen_dom_epi function), the phenotypes produced are written in the form "expressed" or "not expressed". Even though this is technically accurate, it is difficult to ascertain whether a gene is "expressed" or not in practice. This arises from the fact that alleles could be co-dominant or incompletely dominant, which might produce intermediate phenotypes. Moreover, the program also makes a few assumptions. Firstly, we assume that there are only two possible alleles for a single gene, which might not be the case for some genes, such as ABO blood groups where there are three alleles (IA, IB and IO). Secondly, we assume that there is complete dominance in the genes of interest as there might be other forms of dominance between two or more genes (incomplete dominance, co-dominance, etc.). Lastly, we assume that there is no linkage between genes (that genes are assorted independently). Thus, a future extension of the program could be to complement the current code with another program that calculate phenotypic ratios when these assumptions are revoked. This way, the program could be even more generalisable.
In research papers testing for epistasis, the first step carried out by researchers generally is to test for interactions between genes. They then try to relate this gene interaction to a phenotype that is characteristic of the disease. To further test for epistasis and epistatic ratios, researchers carry out breeding programs to test for the ratio of offspring in the F2 generation as seen in the seed colour paper. For fast breeding organisms such as pea plants or insects, it will be relatively easy to carry out breeding programs on a large scale. However, when the organisms have a longer reproductive cycle (i.e humans, trees etc), this will not be possible. This is especially important as many epistatic interactions often lead to genetic diseases and researching these diseases are vital for the medical industry.
These research projects can, thus, use our code to supplement their works. When breeding experiments are not possible, our code can instead generate the ratios that they require. Even for projects that can conduct breeding experiments, our code can be used to further verify their results. Furthermore, our code can minimise human error when calculating ratios. For epistasis involving a large number of genes, deriving Punnet squares and ratios would be tedious and time consuming, thus making it prone to human error. Our code will thus be able to prevent this as researchers can cross reference our code to check their work. Hence, we believe our code can be used by researchers testing for epistasis and that it will benefit them greatly.
Our project has succeeded in achieving the aims of calculating phenotypic ratios of any number of genes with a wide range of epistatic relationships. Despite the challenges and limitations that this program faces, it holds the potential to save a lot of resources for scientific studies and provide an efficient alternative to experimentation. Upon further improvement, the program will be able to facilitate many aspects of genetics including the modeling of diseases and epistatic analysis.
Campbell, R. F.; McGrath, P. T.; Paaby, A. B. Analysis of Epistasis in Natural Traits Using Model Organisms. Trends in Genetics 2018, 34 (11), 883–898. https://doi.org/10.1016/j.tig.2018.08.002.
Deviations from Mendelian Genetics: Types of Deviation Involving One Gene, Multiple Gene, and Other Factors https://www.embibe.com/exams/deviations-from-mendelian-genetics/.
Nethravathi Siri. Gene interaction -Complementary, Supplementary,Dominant Epistasis, R… https://www.slideshare.net/SIRIHG/gene-interaction-complementary-supplementarydominant-epistasis-recessive-epistasis-non-epistasis.
Epistasis and Its Effects on Phenotype | Learn Science at Scitablehttps://www.nature.com/scitable/topicpage/epistasis-gene-interaction-and-phenotype-effects-460/.
Malaviya, D. R.; Roy, A. K.; Kaushal, P.; Yadav, A.; Pandey, D. K. Complementary Gene Interaction and Xenia Effect Controls the Seed Coat Colour in Interspecific Cross between Trifolium Alexandrinum and T. Apertum. Genetica 2019, 147 (2), 197–203. https://doi.org/10.1007/s10709-019-00063-5.
Innark, P.; Panyanitikoon, H.; Khanobdee, C.; Samipak, S.; Jantasuriyarat, C. QTL Identification for Downy Mildew Resistance in Cucumber Using Genetic Linkage Map Based on SSR Markers. J Genet 2020, 99 (1), 81. https://doi.org/10.1007/s12041-020-01242-6.
Huang, B.; Wu, W.; Hong, Z. Genetic Interactions of Awnness Genes in Barley. Genes 2021, 12 (4), 606. https://doi.org/10.3390/genes12040606.
Suneetha, L. M.; Marsakatla, P.; Ravi, G. V.; Sykam, A.; Raju, R.; Reddy, P. P.; Hara Gopal, V. V.; Jadhav, R.; Suneetha, S. Phenotypic Characterization of a Pair of Molecules in Tissues Confer to Classical Mendelian or Non Mendelian Ratios. Medical Hypotheses 2016, 94, 112–117. https://doi.org/10.1016/j.mehy.2016.07.008.
Grammer, T. C.; Khokha, M. K.; Lane, M. A.; Lam, K.; Harland, R. M. Identification of Mutants in Inbred Xenopus Tropicalis. Mechanisms of Development 2005, 122 (3), 263–272. https://doi.org/10.1016/j.mod.2004.11.003.
Huang, L. S.; Sternberg, P. W. Genetic Dissection of Developmental Pathways; WormBook, 2006.
Hereford, L. M.; Hartwell, L. H. Sequential Gene Function in the Initiation of Saccharomyces Cerevisiae DNA Synthesis. Journal of Molecular Biology 1974, 84 (3), 445–461. https://doi.org/10.1016/0022-2836(74)90451-3.